Apache Falcon(一)安装

如今,随着大数据越来越流行,大数据安全治理也变得越来越火热,介绍下其中数据处理框架Apache的顶级项目Falcon。先以安装为开始。

1. Hadoop配置项的修改

1.1 修改yarn-site.xml

操作机器

在主机-1、主机-2、主机-3节点上使用 hdfs 用户, /var/local/hadoop/hadoop-2.6.0/etc/hadoop 目录下

操作指令

1
vim yarn-site.xml

标签内修改添加如下内容

<property>  
   <name>mapred.jobtracker.taskScheduler</name>  
   <value>org.apache.hadoop.mapred.FairScheduler</value>  
 </property>  
 <property>  
     <name>mapred.fairscheduler.allocation.file</name>  
     <value>/var/local/hadoop/hadoop-2.6.0/etc/hadoop/fair-scheduler.xml</value>  
  </property>  
  <property>  
    <name>mapred.fairscheduler.preemption</name>  
    <value>true</value>  
  </property>  
  <property>  
    <name>mapred.fairscheduler.assignmultiple</name>  
    <value>true</value>  
  </property>  
  <property>  
    <name>mapred.fairscheduler.poolnameproperty</name>  
    <value>mapred.job.queue.name</value>  
    <description>job.set("mapred.job.queue.name",pool); </description>  
  </property>  
  <property>  
    <name>mapred.fairscheduler.preemption.only.log</name>  
    <value>true</value>  
  </property>  
  <property>  
    <name>mapred.fairscheduler.preemption.interval</name>  
    <value>15000</value>  
  </property>  
  <property>  
    <name>mapred.queue.names</name>  
    <value>default,hadoop,hive</value>  
  </property>  
<property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>20960</value>
  </property>
  <property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>1024</value>
  </property>
  <property>
  <name>yarn.scheduler.maximum-allocation-mb</name>
  <value>2048</value>
  </property>

1.2 添加公平策略fair-scheduler.xml

因为Falcon的运作过程中会涉及大量的MapReduce作业,所以yarn调度需要进行一些处理,来保证他的负载均衡,所以此处我们最好加上公平策略。

操作机器

在主机-1、主机-2、主机-3节点上使用 hdfs 用户, /var/local/hadoop/hadoop-2.6.0/etc/hadoop 目录下

操作指令

vim fair-scheduler.xml

在文件中添加如下内容:

<?xml version="1.0"?>  
<allocations>  
<pool name="hive">  
<minMaps>90</minMaps>  
<minReduces>20</minReduces>  
<maxRunningJobs>20</maxRunningJobs>  
<weight>2.0</weight>  
<minSharePreemptionTimeout>30</minSharePreemptionTimeout>  
</pool>  

<pool name="hadoop">  
<minMaps>9</minMaps>  
<minReduces>2</minReduces>  
<maxRunningJobs>20</maxRunningJobs>  
<weight>1.0</weight>  
<minSharePreemptionTimeout>30</minSharePreemptionTimeout>  
</pool>  

<user name="hadoop">  
<maxRunningJobs>6</maxRunningJobs>  
</user>  
<poolMaxJobsDefault>10</poolMaxJobsDefault>  
<userMaxJobsDefault>8</userMaxJobsDefault>  
<defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout>  
<fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>  
</allocations>  

指令说明

minResources: 最少资源保证量,格式为 “X mb,Y vcores”. 各队列最少资源保证量之和最好不要超过YARN最大的可使用的内存

maxResources: 最多可以使用资源量,Fair Scheduler会保证每个队列使用的资源量不会超过该队列的最多可使用资源量.

maxRunningApps: 最多同时运行的应用程序数目.

schedulingPolicy: 队列采用的调度模式,支持 fifo,fair,drf

aclSubmitApps: 可向队列中提交应用程序的数目,默认是*.如果需要指定多个用户可以这样hadoopuser hadoopgroup,sparkuser

aclAdministerApps: 该队列的管理员列表,管理员可以杀死队列中的任一个任务

userMaxAppsDefault: 默认的用户最多同时运行应用程序

注:
使用队列时只使用叶子队列
maxRunningApps 参数非常有用需要根据当前集群的可用内存资源来配置。
会因hadoop的版本不同,如果设置不当(值过大),测试短时间内提交多个yarn程序同时运行时Yarn资源迅速用完时,各Job会长时间等待任务的分配。

2. 编译和部署Oozie-4.2.0

oozie是falcon运行过程中所运行的调度器,所以其是falcon正常运行所必须的。

2.1 编译oozie

操作机器

在集群 namenode 节点 主机-1 上使用 hdfs 用户, /home/hdfs 目录下

操作指令

wget http://mirror.bit.edu.cn/apache/oozie/4.2.0/oozie-4.2.0.tar.gz

下载 oozie 源代码

tar zxvf oozie-4.2.0.tar.gz
cd oozie-4.2.0

解压缩 oozie-4.2.0, 进入 oozie源代码目录

bin/mkdistro.sh -DskipTests -Phadoop-2 -Dhadoop.auth.version=2.6.0 -Ddistcp.version=2.6.0 -Dhive.version=1.2.1 -Dsqoop.version=1.4.6

编译oozie-4.2.0,编译完成后在distro/target/ 目录下有oozie程序压缩包 oozie-4.2.0-distro.tar.gz

tar -zxf oozie-4.2.0-distro.tar.gz
mv oozie-4.2.0 /var/local/hadoop/

解压缩oozie-4.2.0并将得到的文件移动到 /var/local/hadoop 目录下

2.2修改hdfs配置

操作机器

在集群namenode节点主机-1上使用hdfs用户。

操作指令

vim /var/local/hadoop/hadoop-2.6.0/etc/hadoop/core-site.xml

修改 hadoop core-site.xml文件,添加如下配置:

<property>  
<name>hadoop.proxyuser.hdfs.hosts</name>  
<value>*</value>  
</property>  
<property>  
<name>hadoop.proxyuser.hdfs.groups</name>  
<value>*</value>  
</property>

其中hdfs是用户之后运行oozie的用户名

hdfs dfsadmin -refreshSuperUserGroupsConfiguration
yarn rmadmin -refreshSuperUserGroupsConfiguration

不重启hadoop集群,而使配置生效

2.3添加Oozie lib扩展包

操作机器

在集群namenode节点主机-1上使用hdfs用户。

操作指令

cd /var/local/hadoop/oozie-4.2.0
mkdir libext
tar zxvf oozie-sharelib-4.2.0.tar.gz

在 oozie-4.2.0 新建 libext 文件夹并进入该目录

cp $HADOOP_HOME/share/hadoop/*/*.jar libext/ 
cp $HADOOP_HOME/share/hadoop/*/lib/*.jar libext/
cp $HIVE_HOME/lib/*.jar libext/ 
cp share/lib/hcatalog/*.jar libext/

将hadoop的jar包导入oozie libext

cd libext
mv servlet-api-2.5.jar servlet-api-2.5.jar.bak 
mv jsp-api-2.1.jar jsp-api-2.1.jar.bak 
mv jasper-compiler-5.5.23.jar jasper-compiler-5.5.23.jar.bak 
mv jasper-runtime-5.5.23.jar jasper-runtime-5.5.23.jar.bak

把hadoop与tomcat冲突jar包去掉

将附件中ext-2.2.zip上传到$OOZIE_HOME/libext/目录下(此文件请百度自行下载文件)

wget http://mirror.bit.edu.cn/mysql/Downloads/Connector-J/mysql-connector-java-5.1.38.tar.gz
tar zxvf mysql-connector-java-5.1.38.tar.gz
cp mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar /var/local/hadoop/oozie-4.2.0/libext

下载mysql驱动包至libext/目录下

2.4添加Oozie配置项

操作机器

在集群namenode节点主机-1上使用hdfs用户

操作指令

cd /var/local/hadoop/oozie-4.2.0
vim conf/oozie-site.xml

修改oozie配置文件,添加如下内容

<property>
<name>oozie.service.JPAService.create.db.schema</name>  
<value>true</value>  
</property>
<property>
<name>oozie.service.JPAService.jdbc.driver</name>  
<value>com.mysql.jdbc.Driver</value>  
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>  
<value>jdbc:mysql://主机-1:3306/oozie?createDatabaseIfNotExist=true</value>  
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>  
<value>oozie</value>  
</property> 
<property>
<name>oozie.service.JPAService.jdbc.password</name>  
<value>oozie</value>  
</property> 
<property> 
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>  
<value>*=/var/local/hadoop/hadoop-2.6.0/etc/hadoop</value>  
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>oozie.service.ProxyUserService.proxyuser.hdfs.groups</name>
<value>*</value>
</property>

操作说明

其中oozie.service.HadoopAccessorService.hadoop.configurations项配置目录为$HADOOP_HOME下的etc/hadoop目录,主机-1是运行oozie的节点机子,主机是运行oozie的用户名,与前面hadoop的core-site.xml配置一致

2.5添加mysql用户

操作机器

在集群namenode节点主机-1上使用hdfs用户

操作指令

mysql -uroot -padmin

使用管理员用户进入mysql控制台

create database oozie;

创建名称为oozie的数据库

grant all privileges on oozie.* to 'oozie'@'localhost' identified by 'oozie';

设置oozie数据库的访问权限,创建用户名为oozie,密码为oozie的用户

grant all privileges on oozie.* to 'oozie'@'%' identified by 'oozie';

设置oozie数据库的访问权限
update mysql.user set host=’%’ where user=’root’ and host=’localhost’;
insert into mysql.user (host,user,password) values(‘主机-1’,’oozie’,PASSWORD(‘oozie’));

设置oozie用户的认证权限

FLUSH PRIVILEGES;
quit 

退出mysql控制台

重启mysql使配置生效:

sudo service mysqld restart

2.6配置Oozie环境变量

操作机器:

在主机-1,主机-2,主机-3上,使用hdfs用户,任意目录下

操作命令:

sudo vim /etc/profile

主机-1,主机-2,主机-3在文件末尾,添加如下内容:

export OOZIE_HOME=/var/local/hadoop/oozie-4.2.0
export PATH=$PATH:$HADOOP_HOME/bin:$OOZIE_HOME/bin

2.7刷新环境变量

操作机器:

在主机-1,主机-2,主机-3上,使用当前终端,任意目录下

操作命令:

source /etc/profile

2.8部署oozie

操作机器

在集群namenode节点主机-1上使用hdfs用户

操作指令

cd /var/local/hadoop/oozie-4.2.0
bin/oozie-setup.sh prepare-war

打包oozie war包

bin/ooziedb.sh create -sqlfile oozie.sql -run

初始化数据库

vim oozie-server/conf/server.xml

修改服务器端conf/server.xml文件,注释掉下面的记录

bin/oozie-setup.sh sharelib create -fs hdfs://主机-1:9000

将oozie share库中的jar上传至hdfs上

2.9启动并测试oozie

操作机器

在集群namenode节点主机-1上使用hdfs用户

操作指令

jps 

显示启动的java服务,如果列表中没有historyserver服务,则输入下列指令启动historyserver:
$HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

完成后再次输入jps查看启动结果

cd /var/local/hadoop/oozie-4.2.0
bin/oozied.sh start

启动oozie服务

oozie admin -oozie http://主机-1:11000/oozie -status

检验服务是否正常启动,如果显示System model:Normal启动成功,反之失败

操作说明

用户可以在浏览器输入http://主机-1:11000/oozie/进入oozie web端控制台查看oozie运行状态,其中主机-1为安装并运行oozie的节点ip地址

3. 编译和部署Falcon

3.1编译falcon

操作机器

在集群 namenode 节点 主机-1 上使用 hdfs 用户, /home/hdfs 目录下

操作指令

wget http://mirror.bit.edu.cn/apache/falcon/0.9/apache-falcon-0.9-sources.tar.gz

下载 falcon源码

tar -zxvf apache-falcon-0.9-sources.tar.gz
cd falcon-sources-0.9/

解压缩得到代码文件 falcon-sources-0.9 并进入

export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=256m -noverify" && mvn clean install -Dhadoop.version=2.6.0 -Doozie.version=4.2.0 –DskipTests

打包编译falcon源码,如果在编译过程中出现npm error报错,输入下列指令安装npm:

sudo yum -y install npm

安装npm并使用国内镜像

npm --registry https://registry.npm.taobao.org info underscore
mvn clean assembly:assembly -DskipTests -DskipCheck=true

编译完成后在target/ 文件夹下存在apache-falcon-0.9-bin.tar.gz和apache-falcon-0.9-bin.zip压缩包

tar -zxvf target/apache-falcon-0.9-bin.tar.gz
mv falcon-0.9 /var/local/hadoop/

解压缩falcon工程包,并将解压缩得到的falcon-0.9 文件移到/var/local/hadoop 目录下

3.2 修改oozie配置项

操作机器

在集群 namenode 节点 主机-1 上使用 hdfs 用户

操作指令

cd /var/local/hadoop/oozie-4.2.0
vim conf/oozie-site.xml

在oozie的oozie-site.xml配置文件中添加以下内容

<!-- Oozie EL Extension configurations for falcon -->
<property>
    <name>oozie.service.ELService.ext.functions.coord-job-submit-instances</name>
    <value>
        now=org.apache.oozie.extensions.OozieELExtensions#ph1_now_echo,
        today=org.apache.oozie.extensions.OozieELExtensions#ph1_today_echo,
        yesterday=org.apache.oozie.extensions.OozieELExtensions#ph1_yesterday_echo,
        currentWeek=org.apache.oozie.extensions.OozieELExtensions#ph1_currentWeek_echo,
        lastWeek=org.apache.oozie.extensions.OozieELExtensions#ph1_lastWeek_echo,
        currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_currentMonth_echo,
        lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_lastMonth_echo,
        currentYear=org.apache.oozie.extensions.OozieELExtensions#ph1_currentYear_echo,
        lastYear=org.apache.oozie.extensions.OozieELExtensions#ph1_lastYear_echo,
        formatTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_formatTime_echo,
        latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo,
        future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo
    </value>
    <description>
        EL functions declarations, separated by commas, format is [PREFIX:]NAME=CLASS#METHOD.
        This property is a convenience property to add extensions to the built in
        executors without having to
        include all the built in ones.
    </description>
</property>
<property>
    <name>oozie.service.ELService.ext.functions.coord-action-create-inst</name>
    <value>
        now=org.apache.oozie.extensions.OozieELExtensions#ph2_now_inst,
        today=org.apache.oozie.extensions.OozieELExtensions#ph2_today_inst,
        yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday_inst,
        currentWeek=org.apache.oozie.extensions.OozieELExtensions#ph2_currentWeek_inst,
        lastWeek=org.apache.oozie.extensions.OozieELExtensions#ph2_lastWeek_inst,
        currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth_inst,
        lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth_inst,
        currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear_inst,
        lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear_inst,
        latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo,
        future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo,
        formatTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_formatTime,
        user=org.apache.oozie.coord.CoordELFunctions#coord_user
    </value>
    <description>
        EL functions declarations, separated by commas, format is [PREFIX:]NAME=CLASS#METHOD.
        This property is a convenience property to add extensions to the built in
        executors without having to
        include all the built in ones.
    </description>
</property>
<property>
    <name>oozie.service.ELService.ext.functions.coord-action-create</name>
    <value>
        now=org.apache.oozie.extensions.OozieELExtensions#ph2_now,
        today=org.apache.oozie.extensions.OozieELExtensions#ph2_today,
        yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday,
        currentWeek=org.apache.oozie.extensions.OozieELExtensions#ph2_currentWeek,
        lastWeek=org.apache.oozie.extensions.OozieELExtensions#ph2_lastWeek,
        currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth,
        lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth,
        currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear,
        lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear,
        latest=org.apache.oozie.coord.CoordELFunctions#ph2_coord_latest_echo,
        future=org.apache.oozie.coord.CoordELFunctions#ph2_coord_future_echo,
        formatTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_formatTime,
        user=org.apache.oozie.coord.CoordELFunctions#coord_user
    </value>
    <description>
        EL functions declarations, separated by commas, format is [PREFIX:]NAME=CLASS#METHOD.
        This property is a convenience property to add extensions to the built in
        executors without having to
        include all the built in ones.
    </description>
</property>
<property>
    <name>oozie.service.ELService.ext.functions.coord-job-submit-data</name>
    <value>
        now=org.apache.oozie.extensions.OozieELExtensions#ph1_now_echo,
        today=org.apache.oozie.extensions.OozieELExtensions#ph1_today_echo,
        yesterday=org.apache.oozie.extensions.OozieELExtensions#ph1_yesterday_echo,
        currentWeek=org.apache.oozie.extensions.OozieELExtensions#ph1_currentWeek_echo,
        lastWeek=org.apache.oozie.extensions.OozieELExtensions#ph1_lastWeek_echo,
        currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_currentMonth_echo,
        lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph1_lastMonth_echo,
        currentYear=org.apache.oozie.extensions.OozieELExtensions#ph1_currentYear_echo,
        lastYear=org.apache.oozie.extensions.OozieELExtensions#ph1_lastYear_echo,
        dataIn=org.apache.oozie.extensions.OozieELExtensions#ph1_dataIn_echo,
        instanceTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_nominalTime_echo_wrap,
        formatTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_formatTime_echo,
        dateOffset=org.apache.oozie.coord.CoordELFunctions#ph1_coord_dateOffset_echo,
        user=org.apache.oozie.coord.CoordELFunctions#coord_user
    </value>
    <description>
        EL constant declarations, separated by commas, format is [PREFIX:]NAME=CLASS#CONSTANT.
        This property is a convenience property to add extensions to the built in
        executors without having to
        include all the built in ones.
    </description>
</property>
<property>
    <name>oozie.service.ELService.ext.functions.coord-action-start</name>
    <value>
        now=org.apache.oozie.extensions.OozieELExtensions#ph2_now,
        today=org.apache.oozie.extensions.OozieELExtensions#ph2_today,
        yesterday=org.apache.oozie.extensions.OozieELExtensions#ph2_yesterday,
        currentWeek=org.apache.oozie.extensions.OozieELExtensions#ph2_currentWeek,
        lastWeek=org.apache.oozie.extensions.OozieELExtensions#ph2_lastWeek,
        currentMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_currentMonth,
        lastMonth=org.apache.oozie.extensions.OozieELExtensions#ph2_lastMonth,
        currentYear=org.apache.oozie.extensions.OozieELExtensions#ph2_currentYear,
        lastYear=org.apache.oozie.extensions.OozieELExtensions#ph2_lastYear,
        latest=org.apache.oozie.coord.CoordELFunctions#ph3_coord_latest,
        future=org.apache.oozie.coord.CoordELFunctions#ph3_coord_future,
        dataIn=org.apache.oozie.extensions.OozieELExtensions#ph3_dataIn,
        instanceTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_nominalTime,
        dateOffset=org.apache.oozie.coord.CoordELFunctions#ph3_coord_dateOffset,
        formatTime=org.apache.oozie.coord.CoordELFunctions#ph3_coord_formatTime,
        user=org.apache.oozie.coord.CoordELFunctions#coord_user
    </value>
    <description>
        EL functions declarations, separated by commas, format is [PREFIX:]NAME=CLASS#METHOD.
        This property is a convenience property to add extensions to the built in
        executors without having to
        include all the built in ones.
    </description>
</property>
<property>
    <name>oozie.service.ELService.ext.functions.coord-sla-submit</name>
    <value>
        instanceTime=org.apache.oozie.coord.CoordELFunctions#ph1_coord_nominalTime_echo_fixed,
        user=org.apache.oozie.coord.CoordELFunctions#coord_user
    </value>
    <description>
        EL functions declarations, separated by commas, format is [PREFIX:]NAME=CLASS#METHOD.
    </description>
</property>
<property>
    <name>oozie.service.ELService.ext.functions.coord-sla-create</name>
    <value>
        instanceTime=org.apache.oozie.coord.CoordELFunctions#ph2_coord_nominalTime,
        user=org.apache.oozie.coord.CoordELFunctions#coord_user
    </value>
    <description>
        EL functions declarations, separated by commas, format is [PREFIX:]NAME=CLASS#METHOD.
    </description>
</property>
<!-- Required to Notify Falcon on Workflow job status. -->
<property>
    <name>oozie.services.ext</name>
    <value>
        org.apache.oozie.service.JMSAccessorService,
        org.apache.oozie.service.JMSTopicService,
        org.apache.oozie.service.EventHandlerService
    </value>
</property>
<property>
    <name>oozie.service.EventHandlerService.event.listeners</name>
    <value>
        org.apache.oozie.jms.JMSJobEventListener
    </value>
</property>
<property>
    <name>oozie.jms.producer.connection.properties</name>
    <value>
        java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://主机-1:61616
    </value>
</property>
<property>
<name>oozie.service.JMSTopicService.topic.name</name>
<value>
    WORKFLOW=ENTITY.TOPIC, COORDINATOR=ENTITY.TOPIC
</value>
<description>
Topic options are ${username} or a fixed string which can be specified as default or for a
particular job type.
For e.g To have a fixed string topic for workflows, coordinators and bundles,
specify in the following comma-separated format: {jobtype1}={some_string1}, {jobtype2}={some_string2}
where job type can be WORKFLOW, COORDINATOR or BUNDLE.
Following example defines topic for workflow job, workflow action, coordinator job, coordinator action,
bundle job and bundle action
WORKFLOW=workflow,
COORDINATOR=coordinator,
BUNDLE=bundle
    For jobs with no defined topic, default topic will be ${username}
</description>
</property>
<property>
    <name>oozie.service.JMSTopicService.topic.prefix</name>
    <value>FALCON.</value>
    <description>
        This can be used to append a prefix to the topic in oozie.service.JMSTopicService.topic.name. For eg: oozie.
    </description>
</property>

操作说明

将$FALCON_HOME/oozie/conf/oozie-site.xml 的配置项内容添加到$OOZIE_HOME/conf/oozie-site.xml 中,其中将oozie 的jms消息连接项oozie.jms.producer.connection.properties里面的通讯地址改为oozie运行的节点,即主机-1。

3.3添加Falcon jar包至oozie库文件

操作机器

在集群 namenode 节点 主机-1 上使用 hdfs 用户

操作指令

cd /var/local/hadoop/oozie-4.2.0    
cp /var/local/hadoop/falcon-0.9/oozie/libext/*.jar libext/

将falcon在oozie目录下的扩展jar包拷贝至$OOZIE_HOME/libext文件夹下

bin/oozie-stop.sh
bin/oozie-setup.sh prepare-war
bin/oozie-start.sh

重新部署并启动oozie

3.4Falcon client配置

操作机器

在集群 namenode 节点 主机-1 上使用 hdfs 用户

操作指令
cd /var/local/hadoop/falcon-0.9

vim conf/client.properties

修改client.properties文件中falcon.url的值,将其改为

falcon.url=https://{主机-1}:{port}/

操作说明

falcon.url指定了falcon server的ip地址,在本例中为主机-1的ip地址,port为falcon启动时配置的端口号,默认为15443.

3.5修改Falcon配置文件

操作机器

在集群 namenode 节点 主机-1 上使用 hdfs 用户

操作指令

cd /var/local/hadoop/falcon-0.9
vim conf/startup.properties

将*.broker.url的值改动如下

*.broker.url=tcp/主机-1:61616

操作说明

*.broker.url为Falcon自带activemq消息发送地址,即Falcon运行所在的节点机器,在本例中为主机-1的ip地址。

3.6配置Falcon环境变量

操作机器:

在主机-1,主机-2,主机-3上,使用hdfs用户,任意目录下

操作命令:

sudo vim /etc/profile

#主机-1,主机-2,主机-3在文件末尾,添加如下内容:

export FALCON_HOME=/var/local/hadoop/falcon-0.9
export PATH=$PATH:$HADOOP_HOME/bin:$FALCON_HOME/bin

3.7刷新环境变量

操作机器:

在主机-1,主机-2,主机-3上,使用当前终端,任意目录下

操作命令:

source /etc/profile

3.8创建Falcon client

操作机器

在集群 namenode 节点 主机-1 上使用 root用户

操作指令

useradd -U -m falcon-dashboard -G users
groups falcon-dashboard

显示falcon-dashboard : falcon-dashboard users则创建成功

scp -r /var/local/hadoop/falcon-0.9 主机@主机-client:/home/主机/

将falcon发送至client端

操作说明

传到client上需要输入client端主机用户的密码

3.9启动Falcon

操作机器

在集群 namenode 节点 主机-1 上使用 hdfs 用户

操作指令

cd /var/local/hadoop/falcon-0.9

bin/falcon-start

启动Falcon server

jps

显示java进程,如果列表中有Falconserver则启动成功

操作说明

用户可以在client端通过浏览器输入https://主机-1:15443进入falcon web端控制台。其中注意falcon server使用https协议,如果输入地址为http则显示出错。

坚持原创技术分享,您的支持将鼓励我继续创作!